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This paper is concerned with the transmission of a discrete, independent 
letter information source over a discrete channel. A distortion function is 
defined between source output letters and decoder output letters and is used 
to measure the performance of the system for each transmission. The 
coding block length is introduced as a variable and its influence upon the 
minimum attainable transmission distortion is investigated. 

The lower bound to transmission distortion is found to converge to 
the distortion level d c (C is the channel capacity) algebraically as a/n. 
The nonnegative coefficient a is a function of both the source and channel 
statistics, which are interrelated in such a way as to suggest the utility of 
this coefficient as a measure of "mismatch" between source and channel, 
the larger the mismatch the slower the approach of the lower bound to the 
asymptote d c • For noiseless channels a = <» and for this case the lower 
bound is shown to converge to d c as a! (In n)/n. 

For noisy channels the upper bound to transmission distortion is found 
to converge to the asymptote d c algebraically as b[(ln n)/n]*. For noiseless 
channels, the upper bound converges to d c as ai(ln n)/n. 



* The material presented in this paper is based upon the author's thesis, 
"Coding Theorems for Discrete Source-Channel Pairs," presented to the Massa- 
chusetts Institute of Technology in November 1966 in partial fulfillment of the 
requirements for the degree of Doctor of Philosophy. 
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I. INTRODUCTION 

By now the results originally obtained by Shannon 1 relating relia- 
bility and channel capacity are well known. Roughly speaking, they 
state that perfect transmission can be achieved if, and only if, the 
capacity of the channel in the transmission link is greater than the 
information content of the source. For amplitude and time discrete 
sources the information content is the entropy of the source, but for 
amplitude continuous sources the entropy and the information con- 
tent are not the same since the information content is infinite. This, 
of course, implies that perfect transmission of amplitude continuous 
sources, or discrete sources with an entropy that is "too large," is 
impossible with a given finite capacity channel. Yet this is just the 
situation that is often presented to the communication engineer who 
must then try to reduce the average distortion to the lowest possible, 
or practicable, level. 

For communication systems in which the capacity of the channel 
is not sufficient to allow perfect transmission, there are two obvious 
questions to ask: 

(i) How small can the average distortion be made if any transmis- 
sion strategy at all is allowed? 

(ii) How much does the system complexity, or cost, increase when 
you are required to get "closer" to this minimum? 

To answer the first question, Shannon generalized his results in a 
later paper 2 in which the channel requirements are found that are 
necessary and sufficient to allow transmission at a given level of 
distortion, or a given error rate. It is our purpose here to consider 
the second question. We use the coding block length to measure the 
complexity of the system, and study the behavior of the minimum 
attainable transmission distortion as the block length is increased. 

In the work we restrict our attention to sources and channels that 
are discrete in amplitude and time, and that are constant and memory- 
less. This means that successive events are independent and are 
governed by the same probability distributions. The encoder is a 
block encoder that we describe later in this section. To measure the 
distortion in the system, we introduce a nonnegative function d(w,z) 
which gives the distortion in the event letter z is presented to the 
user at the decoder output when letter w was transmitted. Normally, 
this function would be specified by the user of the system to reflect 
how undesirable any particular misinterpretation of the source output 
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is to him. We will assume that the distortion between two sequences 
of letters is the averaged sum of the composing letter distortions. 

Shannon's theoiy associates with each source and distortion function 
a rate-distortion curve which expresses the minimum attainable trans- 
mission distortion in terms of the maximum allowable mutual in- 
formation in the system. Associated with each point (d R ,R) on the 
rate-distortion curve is a particular set of transition probabilities, 
called the "test channel," which has the significance that among all 
channels that transmit the given source with distortion d R or less, it 
operates at the lowest transmission rate, R. Equivalently, the test 
channel is that channel which yields the lowest distortion d R among 
those that transmit information from the source at a rate R or less. 
It is in this sense the cheapest channel one could use and meet a 
distortion criterion. The rate R can also be interpreted as the equi- 
valent information content of the source when a distortion d R is 
tolerable. 

That the rate-distortion curve gives the channel capacity sufficient 
to allow a prescribed performance is shown by Shannon through the 
intermediate step of proving that the rate-distortion curve actually 
expresses the entropy and resultant distortion in the "best" discrete 
representation of an output sequence from the original source. This 
discrete representation can then be transmitted with no further dis- 
tortion, if its entropy is less than the channel capacity, by the use 
of suitable channel coding techniques. 

Shannon has found the rate-distortion curves for many discrete 
sources and an explicit expression for this curve for time discrete 
gaussian sources. These results, together with Shannon's work with 
vector sources, were used to get rate-distortion curves for gaussian 
random processes. 3 ' 4 Bounds to the rate-distortion curve for non- 
gaussian sources have also been obtained. 5, 6 

However, all of the rate-distortion results derived for both con- 
tinuous and discrete sources are limiting results, that is, they can 
be approached in general only when arbitrarily complex operations 
on very long sequences of source output are allowed before transmit- 
ting the "message" through a correspondingly large use of the channel. 
T. Goblick was the first to study the rate of approach to these limit- 
ing results as the source output block length increases, but limited 
his work to source representation or source encoding, with a deter- 
ministic map between the source and its representation. 7 Our work 
includes a noisy channel, or probabilistic function, between the 
source and user. 
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A performance curve d{n) will be introduced for each source- 
channel pair as the minimum possible average distortion obtainable 
using a modulator that encodes a string of n successive source outputs 
into an input signal acceptable by a channel composed of n uses of 
the original channel. For a source with the rate-distortion curve of 
Fig. 1 and a channel with capacity C, the performance curve might 
look like the one shown in Fig. 2. 

From Shannon's theory it is known that the performance curve 
starts at d , the zero-rate distortion, and decreases to asymptotically 
approach d c , the distortion corresonding to the information rate C 
on the rate-distortion curve. The curve, of course, has meaning only 
for integral values of n. Not all modulators and decoders provide a 
distortion curve that approaches d c for large n, but this curve ob- 
viously must lie above the performance curve which alternately 
could have been defined as the lower envelope to the set of distortion 
curves corresponding to all encoder-decoder pairs. 

II. THE LOWER BOUND 

Upper and lower bounds to the performance curve have been 
derived. 8 We present the lower bound in the first part of this paper, 
and the upper bound in Sections XI through XVII. Most of our 
effort concerning the lower bound was directed toward finding infor- 
mation about the rate of approach of the performance curve to its 
asymptote. In particular, we tried to relate the source and channel 
statistics, as well as the method of encoding that is used, to the rate 
of approach of d(n) to d c . 




Fig. 1 — The rate distortion curve for S. 






Transmission distortion 
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Fig. 2 — The performance curve for S and C. 

Concerning this rate of approach, several interesting situations 
are known to exist. For one, there are some source-channel pairs for 
which the minimum attainable transmission distortion is independent 
of the encoding block length, with the consequence that it is possible 
to attain the distortion level d c with a coding block length of one. 
One example of such a pair is a binary symmetric source (equally 
likely binary letters with d(i,j) = 1 — 8y, i,j = 1,2) used with a 
binary symmetric channel, where the optimum encoder is a direct 
connection. Another example is a gaussian source used with an addi- 
tive gaussian noise channel, where the optimum encoder is simply 
an amplifier. 

When the source-channel pair is such that the minimum attainable 
distortion is independent of the coding block length we shall say 
that the source and channel are "matched." For the more common 
situation wherein the minimum attainable transmission distortion 
decreases with increasing encoding block length to asymptotically 
approach the distortion level d c , we say that there is a "mismatch" 
between the source and channel, and suggest as a measure of this 
mismatch the "slowness" of the approach of the distortion to d c . 

Another interesting situation occurs when there is a choice of 
using one of several channels of different capacity. Although the 
channel of highest capacity would be the best choice when one is 
willing to use infinite block length coding, it might not be the best 
choice with finite length coding. This could easily happen if the high 
capacity channel were very much more mismatched to the source 
than some lower capacity channel. 
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Til. SYSTEM MODEL 

Figure 3 is a detailed illustration of the transmission system that we 
work with. The source S produces a sequence of letters u = w, , a> 2 , 
• • • , w n , each chosen from the alphabet W = [w>, , • • • , w H ], which is 
mapped by the encoder into a sequence of channel input letters £ = £1 , 
£2 , • • • , £n , each a member of X — [x t , • • • , X K }- The channel then 
transforms the channel input word £ into a sequence of channel output 
letters n = v\ , Vi , ' • ' » 'Jn which are members of 7 = {?/i , • • ■ , y L \, 
and n in turn is decoded by the receiver into a sequence C = f 1 > 
Z2 , • ■ • » fn of letters from the decoding space Z = {zi , • • • , Zj}. 

The source and channel are both assumed to be constant and memory- 
less; therefore, successive events on each are independent and governed 
by the same probability distributions. In particular we have 



Pu(w) = IIp-.W 

m-1 

Pm t (y |x) = JlPn m \u(y m I x m )> 




Fiji. 3 — Block diagram of the encoding and decoding. 
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where the superscript on w m , x", y m is used to denote the ra'th letter 
in the n-letter words w, x, y respectively, and is not to be confused with 
the particular letters w m , x m , and y m in the alphabets W, X, and Y. 
The subscripts on the probability distribution are hereafter dropped 
whenever no confusion will occur. 

The distortion in the system when the source word w is transmitted 
but received as z is taken to be the normalized sum of the n letter 
distortions, or 

d(w,z) =lZd(w m ,z m ). (1) 

Tl m-l 

Finally, although we have set up the problem so that a sequence 
of n source letters is transmitted as a sequence of n channel letters, 
different block lengths at the source output and channel input can be 
allowed by considering a new source and channel that are products 
of the original ones, with the order of each product adjusted to obtain 
the desired block length ratio n a /n c . 

IV. THE SPHERE PACKING ARGUMENT 

A generalization of the sphere-packing concept is used to derive 
the lower bound. We assume the coding block length is n and derive 
a bound conditioned on the event that a particular source word w has 
occurred at the source output. We further assume that the channel 
input word x is used to transmit w, but delay the selection of x until 
the end of the derivation when the result is optimized over all possible 
choices. The total lower bound to distortion is found by averaging this 
conditioned lower bound over all source words in W". The asymptotic 
form of this bound is studied in detail and from it a measure of mis- 
match between the source and channel is defined. 

The idea involved can be described with the following simple, but 
poor, bound which is subsequently improved. Remembering that the 
source word w is assumed transmitted by the channel input word x, 
we list all possible chamiel output words, y, ordered in decreasing 
conditional probability p(y | x), and pair with each the decoder output 
word z(y) to which it is decoded by the optimum decoder. The resulting 
(conditional) distortion, 

d(w) = Z?(y |x)d[w, z(y)], (2) 

is seen to equal the sum of conditional probability-distortion products 
on this list. If the set of distortion values that appear on this list is 
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now rearranged (with the list of conditional probabilities fixed) to 
be ordered according to increasing distortion values, the resulting 
sum of conditional probability-distortion products must be smaller 
than, or at most equal to, the sum in equation 2. It therefore provides 
a lower bound. 

The improved lower bound uses the same sort of orderings and re- 
arrangements but includes a probability function, /(y), in the ordering 
of the channel output words. This function is denned over the set of 
channel output words, F", and is later chosen to optimize the result. 
The channel output words are now ordered according to increasing 
values of the information difference 7(x, y) = (1/w) In [/(y)/p(y | x)] 
and each is again paired with the decoder output word z(y) to which 
it is decoded by the optimum decoder. 

The rearrangement of decoder output words is also slightly different. 
To describe this rearrangement we visualize each channel output word, 
y, as "occupying" an interval of width /(y) along the line [0, 1]. The 
decoder output word, z(y), that is paired with a particular channel 
output word y is also viewed as occupying the same region along [0, 1] 
as y, but, because any particular word z might be the decoding result of 
several channel output words, the region along [0, 1] occupied by z„ 
could be a set of separated intervals. The rearrangement of decoder 
output words is this time a rearrangement of occupancies in [0, 1] 
toward the desired configuration wherein the decoder words are ordered 
in increasing distortion along this line, and each occupies the same 
total width in [0, 1] as it did before the ordering. Thus two monotone 
nondecreasing functions can be defined along the line [0, 1]; one, 1(h), 
giving the information difference 7(x, y) at the point h, £ h ^ 1, and 
the other, d(h), giving the distortion d(w, z) at h. The first theorem 
presents a lower bound to the single word distortion in terms of these 
two functions. 

Theorem 1: The average transmission distortion, d(w), conditioned on 
the occurrence oj the source word w and its transmission using the channel 
input word x, satisfies 



d(w) ^ f d(h)e- nIih) dh. 



(3) 



Proof: Figure 4 is used to help prove the inequality. The distortion 
resulting from optimum decoding is given by equation 2; the con- 
ditional probability-distortion products on the previous list before 
rearrangement of the decoder output words. For convenience this is 
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rewritten here as 



d(w) = Z d[w, z(y)] 



which can be seen equal to the "volume" in Fig. 4a enclosed by the 
two "amplitude functions" d' and p/j and the "width measure" /. 

The rearrangement of the decoder output words to obtain the mono- 
tone function d(h) from d'(h) can be accomplished by a sequence of 
interchanges of the following type. We consider any two points in 
£ h £ 1, say hi and h 2 , for which d'{h 2 ) S d'(hi) and p/j(h 2 ) ^ 
p/j (hi). If we consider an interval Ah around each point in which 
both amplitude functions are single valued and interchange amplitude 
values of d' in the two intervals, we effect a volume transformation 
that decreases (or leaves unchanged) the total volume since 

initial volume — final volume 
= [d'ftO 2 (h x ) +d'{h 2 )j(h a )jAh 

- [d'(h 2 ) 2 fa 

= [d'%) - d'(h 2 ))[j %) - ^ (/>,)] Afc 



'- I/O rW?W I ^ 



^ 0. 

Volume interchanges of this type are repeated until the desired 
monotonic function d(h) is obtained. The resulting volume configura- 
tion is then as shown in Fig. 4b. As each interchange of Ah width 
volumes decreases the total volume, or leaves it unchanged, the total 
volume in Fig. 4b is certainly no larger than that in Fig. 4a. We need 
now only notice that p/f(h) = exp— nl(h) to recognize that the 
integral in equation 3 is equal to the volume in Fig. 4b, and, there- 
fore, to establish the inequality claimed in the theorem. 

To be sure, the construction in Fig. 4b, and the calculation of the 
lower bound in equation 2 requires some knowledge of the structure 
of the optimum decoder. Fortunately, this knowledge is minimal; it is 
only the total width along [0, 1] occupied by each member, z, of the 
decoding space Z n . We refer to this occupancy as the "size" of the 
decoding set for z and denote it by ^(z). 

From the construction of the lower bound volume in Fig. 4b, we see 
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that 

g(z) = £ /(y) 

V(z) 

where Y(z) is the set of channel output words that are decoded into z 
by the optimum decoder. Indeed, if we assume unique decoding by the 
optimum decoder we have 

Erf*) = Z Z/(y) = Z/(y) = 1, 

Z" Z" K(z) Y" 

or that <7(z) is also a probability function. Even this function, though, 
is unknown in the general case or at least is impractical to calculate. 
The idea of the lower bound development, therefore, is to retain this 
unknown probability function for the present and subsequently replace 
it with another such function which minimizes the final lower bound 
expression. Within this step an approximation involving the form of 
g(z) is required which is detailed in Section 6.2. 

V. FURTHER EVALUATION OF THE LOWER BOUND IN THEOREM 1 

The integral in equation 3 can be simplified if we suppress the inter- 
mediate variable h and relate the variables d and I directly. The pairings 
of d and I through a common value of h, d(h) = 1(h), does not by itself 
define a function because several different values of d could be paired 
with a given value of J, and vice versa. However, we will use the prop- 
erties that exist among these pairs to define a distortion function d(I) 
which has the property that for any /, the dependent variable d is at 
least as small as the smallest d(h) among the pairs that have 1(h) = I. 

To do this, we reinterpret the monotone nondecreasing functions 
d(h) and 1(h). First, we view the distortion d(w, z) as a random variable 
on Z n governed by g(z). Its cumulative distribution function 

G(d) = 2 g(z) (5) 

n 

d(w.z)Sd 

is then seen to be the "inverse" of d(h). (Strictly speaking, the inverse 
of a staircase function does not exist, so the term inverse is used here 
only as an aid in relating d(h) and G(d) pictorially.) In a similar way 
we also view the information difference 7(x, y) as a random variable 
on Y" governed by /(y). Its cumulative distribution function is given by 

FiV) = Z /(y), (6) 
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or the "inverse" of 1(h). The desired function d(I) can now be defined 
in terms of G (d) and Fi (7) by relating to any information difference 
value 7 the distortion value that satisfies 

Fn<n = 0(d). (7) 

The following geometric interpretation of d(I) might be helpful. If 
each size, or "volume," g(z) of the decoding sets is successively placed 
about the volume g(z x ) of the decoded word with minimum distortion 
d(w, z,), and each size, or "volume," /(y) of the channel output words 
successively placed about the volume /(y,) of the channel output word 
with minimum information difference 7(x, y,), the total volume in- 
cluded by a point in the first construction at a distortion "radius" 
d is 0(d) and that included by a point in the second construction at an 
information difference "radius" 7 is 7\(7). The function d(I) then gives 
(except for edge effects) the correspondence between the radii that 
include the same volume in both geometrical constructions. Figure 
5a illustrates the construction of d(I) through the chain 7 — » F,(7") = 
G(d) -» d. 

It is convenient at this point to introduce a second random variable 
of information difference; one which is governed by p(y | x) rather than 
/(y). Its cumulative distribution function is 

f 2 (d= E p(yl*)- ® 

y» 

/(x.y)sf 

To distinguish the two information difference variables, we will 
denote by 7! the variable that has the distribution function in equa- 
tion 6 and by 7 2 the variable that has the distribution function in 
equation 8. 

We are now in a position to rewrite the bound in Theorem 1 in 
terms of functions that involve only d and 7. The distortion function 
d(I) has been constructed to lower bound all d(h) with 1(h) = I, 
thus we can replace d(h) in equation 3 with d[I(h)]. As this substitu- 
tion replaces d(h) with a distortion function that is single valued 
over subintervals of [0,1] in which 7 is a constant, we can perform 
the integration in equation 3 by simply multiplying the integrand in 
each such constant 7 interval by the interval width, dF x (I), and 
summing. Therefore, we can continue the inequality in equation 3 
with 

d(w) ^ f "'"" d(l) exp (- nl) dF x (I) } 

Jlmlm 
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Fig. 5 — The construction of (a) d(I) and (b) cLl(I). 
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which, upon using p(y | x) = exp (-nl)f(y), establishes the lower bound 
in the next theorem. 

Theorem 2: The average transmission distortion, d(w), conditioned on 
the occurrence of the source ivord w and its transmission using the channel 
input word x, satisfies 



d(w) ^ ["" d(I)dF 2 (I). 

•'/min 



(9) 



VI. AN ESTIMATE OF THE FUNCTION d(I) 

6.1 The Random Variables Ix and I 2 

To obtain an estimate of d(I) we require an estimate of the two dis- 
tribution functions, G(d) and F X (I), from which d(I) was defined. We 
first focus on F t (I) and the random variable Ii . Since the lower bounds 
in Theorems 1 and 2 can be derived for any choice of /(y), we choose 
a form of /(y) that simplifies the following arguments. We specify that 
/(y) factors as 

/(y) = ilKvl- (io) 

One consequence of this assumed form is that the information difference 
7(x, y) is given as a sum of n letter information differences: 

^a-sj^i^fe-sJS*™- (11) 

Among these n letter information differences, however, there are 
different types, depending on the corresponding transmitted letter 
x m in x. To separate these, we introduce the vector c to denote the letter 
composition of the channel input word x, letting c = c t , c 2 , • ■ ■ , c K 
when there are nc t appearances of the letter x x in x, nc 2 appearances 
of x 2 in x, and so on. Thus we can write the information difference in 
equation 10 as 

/(x, y) = I t Z hr (12) 

in which I kT is used to denote the information difference between the 
r'th appearance of the letter x k in x and the corresponding letter in y. 
The interpretation of the I kr as letter information difference random 
variables on Y governed by the letter probability function f(y) can 
now be seen to be consistent with the previous interpretation of 7, 
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as a word information difference random variable on Y n governed by 
/(y). Using the abbreviations 

1W = h 

p(Vi I x k ) = put , 
the probability distribution function of I kr can be written as 

P,., t Jln-M = /, ; l|rgnc t ; 1 £ k £ K. (13) 
L P*iJ 

What this has accomplished is to cast I t as the sum of n independent 
random variables, a step that enables us to use large number laws to 
estimate Fi(I). M ' u 

In an almost identical way, the random variable 7 2 can be cast as a 
sum of n independent random variables. This can be done if we as- 
sociate with the variable I kr the probability distribution function 



2.7 ir 



In •"- = p hl ; 1 ^ r ^ nc k ; 1 ^ k ^ K (14) 

Pkl J 



instead of that in equation 13. With this distribution the word informa- 
tion difference variable I(x, y) in equation 12 can be seen to be governed 
by the probability function p(y | x), therefore, it is equal to the random 
variable 7 2 . 

6.2 The Random Variable d 

In the work so far, the function g(z) is that probability function 
induced on Z n by /(y) through the optimum decoder function and cannot, 
therefore, be freely chosen once /(y) is chosen. On the other hand its 
precise calculation from the optimum decoder is impractical. The only 
alternative is to retain the unknown function g{z) in the lower bound 
expressions and to minimize the final lower bound to distortion over 
all possible probability functions on Z n . Since g(z) is one such probability 
function the inequality in the lower bound is continued. Unfortunately, 
when this is done it cannot, in general, be shown that the function which 
minimizes the lower bound factors into n letter probabilities, a form 
which we were permitted to assume for /(y). However, to proceed 
beyond the bounds in Theorems 1 and 2, it is necessary to approximate 
this g(z) by such a product, as in 

p(z) = fl g(f). (15) 
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The necessity for an approximation of this type is, of course, because of 
the requirement that an estimate be made for the distribution function 
G(d). The assumed form for g(z) in equation 15, will again allow us to 
use large number laws to obtain this estimate. 

More specifically, the assumed product form for g(z) allows us to 
cast the word distortion random variable d(vr, z) as a sum of n inde- 
pendent letter variables. This is done in the following way. Among the 
letter distortions d(w m , z m ) that sum to the total word distortion there 
are H different types, corresponding to each of the different letters 
iDj , 1 ^ 1 ^ H, that appear in the source word w. 

If the composition of this word is q = q x , q 2 , '•• (1h > that is, if 
there are nq x appearances of w x in w, nq 2 appearances of w 2 , and so on, 
the normalized word distortion can be written as 

1 — 221 

d(w, z) = ± £ £ D iT . (16) 

™ «=1 r-1 

In this expression D ir is used to denote the distortion between the 
r'th appearance of the letter w { in w and the corresponding letter in 
z. Equation 15 now allows the interpretation of the D ir as independent 
random variables, having the probability distributions 

P Di ,(d.<) = G, ; 1 ^ r ^ n Qi , IgigH (17) 

d(v)i , %i) = da 

g&i) = 0, , 

with the result that G(d) is an n-fold convolution of elementary dis- 
tribution functions for which there exist many estimating forms. 

We realize that the approximation in equation 15 is not entirely 
satisfactory because it eliminates nonproduct probability functions from 
the minimization of the lower bound and, as far as we know, one of 
these functions could provide the minimization. However, there is 
good reason to believe that this approximation does not significantly 
affect the bound when n is reasonably large. For example, in the next 
several sections we derive a lower bound to distortion that uses the 
product from in equation 15. For this bound the required minimization 
over all probability functions g(z) is reduced to one over all J dimen- 
sional vectors g. It can be shown that if in the limit as n becomes large, 
the product form requirement for g(z) is relaxed, and the minimization 
of this lower bound is again made over all probability functions g(z), 
then the optimizing function g (z) still has the product form. 

Even more significant is the asymptotic form of the lower bound that 



TRANSMISSION DISTORTION 843 

is derived using equation 15. We later show that it is only the final 
value of the minimizing decoder set size vector g (n = °o) that affects 
both the asymptote of the lower bound, d c , and the next lowest order 
term, which is one proportional to 1/n . Values of the minimizing vector 
for finite n, g (n < <»), affect only terms of o(l/n). 

Further, it can be shown that a similar conclusion is reached even 
if the independence property assumed over letters in equation 15 is 
generalized to be over blocks of length r, that is if 

<7(z) = ff 9(z'l 

m-l 

z /m = Zj , z, + , , • • ■ , 3 J+ r-i ; j = mr - r + 1 . 

When g(z) is assumed to have this form, the minimization of the lower 
bound over all decoder set sizes is a minimization over all probability 
functions g(z') on Z T . The conclusion that can be made from the bound 
derived using this assumption is that it is again only the value of the 
minimizing decoder set size function at n = °o, gjj.', »), that in- 
fluences both the asymptote and the term proportional to 1/n. And, 
at n = oo, the minimizing decoder set size function on Z T , g (z', «>), 
factors into a product of single letter probability functions on Z. When 
this solution is substituted in the bound (that uses r ^ 1) the asymptotic 
form is the same for every choice of the constant r. Only lower order 
terms differ for different values of r. 

There is one situation in which the assumed product form in equation 
15 does not represent an approximation. That is the case of a doubly 
uniform source, which is a source that has a uniform probability dis- 
tribution over its letters and has a distortion matrix in which each row 
and column is the respective permutation of another row and column. 
For such a source it has been shown 8 that the probability distribution 
g(z) which minimizes the lower bound in Theorem 1 is uniform for all 
n, thus has the factorability property in equation 15. 

6.3 A Lower Bound to d(I) 

We now seek an approximation to d(I) that we can substitute in 
equation 9 and preserve the inequality. A safe approximation to d(I) 
can be had if, instead of equating Fi(/~) to G(d) as in equation 7, we 
equate a lower bound estimate of G(d) to an upper bound estimate of 
F X (I~). Figure 5b illustrates this construction. The result is another 
distortion function, d L (I), that satisfies 

d,(I) ^ d(I) (18) 
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which can be used in equation 9 to obtain 

d(w) ^ f""d L (I)dF a (I). (19) 

•'Jmin 

Since the random variable I 2 is a normalized sum of n independent 
random variables, its variance is proportional to 1/n. Consequently, 
when n becomes large the distribution function F a (7) has almost all 
of its "rise"_around the mean of 7 2 , which we denote by I. In this 
region, I « T, d « d(I), the values of both distribution functions G(d) 
and Fi(I) are exponentially small. Therefore, the bounds to the tails of 
distribution functions 10 " 13 are applicable to the estimation of G(d) and 
Fi(I) in this region. Indeed, it was with the intended use of these 
powerful bounds that we formed both the distortion and information 
difference random variables as sums of n independent letter random 
variables. All of the bounds, though, are parametric in form and allow 
only a parametric representation of d L (I). 

We have elsewhere 8 applied strict upper and lower bounds to G (d) 
and Fi(I), respectively, to obtain the function dz,(I). However, when 
these bounds are used, the resulting total lower bound to transmis- 
sion distortion, though applicable for all block lengths n, does not 
reveal the correct asymptotic behavior inherent to the sphere-packing 
procedure which has been used. (This happens because the strict 
bounds to 0(d) and F, (I) themselves do not have the correct asymp- 
totic form to large n. ) 

In addition, the resulting lower bound to the total distortion is 
very complex and so does not provide much insight into the factors 
which affect the rate of approach of the performance curve to its 
asymptote. For these reasons, we instead use Shannon's 11 and Gal- 
lager's 13 asymptotic forms for the tails of distribution functions to 
bound 0(d) and Fi (I). These are: 



G(d) ^ . * a „, x + Au(n, s) expn[n(s) - Sf x'(s)] (20a) 

L \/2irns fx (s) J 

n'(s) = d (20b) 



with 

< d ^ E(d | q) = £ rf(w, z | comp w = q)g(z) 
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and 

//l(/) = [ AT-V7777r + A ^ n ' «)1 expn[ 7 (0 - */(*)] (21a) 

7'(0 - I (21b) 

with 

/,„ in < / ^ #(/, | c) - 2 '( x > y I com P x = c )/(y)- 

In these bounds, A v {n, s) and 4j;(», t) are sums of rather difficult 
integrals but each has been shown by Shannon and Gallager to be 

J_ 

Also within the previous bounds, we have used fi(s) to denote the 
semi-invariant moment generating function of the variable d, 

H 

m(s) = £ qtHiis) 

(22) 
// j 

= 2 Qi m 5Z Qi exp s da , 

i - 1 » - 1 

and y(0 to denote the semi-invariant moment generating function 
of the variable I, 

7(0 = S CtfftO) 

(23) 

= X) c* in Z/rw- 

*-l 1 = 1 

To guarantee the boundedness of 7(0, we restrict the vector f to 
have nonzero components. This does not affect the resulting bound. 
(Actually, these bounds strictly apply only when the variables d and 
/ are nonlattice. For lattice variables the corresponding bounds 11,13 
have in their coefficient a quantity A which does not change continu- 
ously with the argument of the distribution function, and cannot be 
used within our derivation. One alternative would be to decrease one 
assigned letter distortion d(iv, z) by an arbitrarily small irrational 
number, and similarly, to change two transition probabilities on the 
channel in a way consistent with a lower bound to distortion. The new 
variables d' and /' would then be nonlattice.) 
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The desired distortion function, d L {I), can now be defined by 
equating the two bounds in equations 20 and 21. It can be con- 
structed through the chain: 7" -*t -»s -»d in which the superscript 
could now be dropped since the bound to Fi(I) is continuous in I. 
It is important to notice that the region of validity of the previous 
two bounds allows definition of the function d L (I) only in a subin- 
terval [7„,I b ] of [7 min , /max] with 

f«i. < la < I < hS E(U I c), I[E(d | q)]. 
Outside the inteival [7 OJ I b ] we can define d L {I) equal to zero and 
write the lower bound in equation 19 as 

d(w) ^ f" ddI)dF,(I). (24) 

J la 

We are now faced with the difficult integration of a doubly para- 
metric expression. Rather than integrate directly, we use the following 
Taylor series expansion for d L (I) within [7„, 7 h ] : 

d L (D = d L G) + diCba - T) + I d'KW - I) 2 + * di"(l'KI - I) 3 

= TS(d L ) 
with I a < V < I b . (The indicated derivatives can be shown to exist 
within the restricted interval [7 , /«,].) Using this form for d L {I) 
within equation 24 we see that if the region of integration were [7 m i n , 
7 max ] instead of [7 a , 7 6 ] , the resulting form would be a sum of central 
moments of 7 2 with the Taylor series derivatives as coefficients. To 
restore this form we rewrite equation 24 as 

) ^ ['"" f' ■•• - f'""TS(d L )dF 2 (D. (25) 

J/„,|„ ■'/rain J Ib 

In these integrals, the lower limit 7mm is finite since / t is assumed 
nonzero for all I, and I max can be taken as the largest finite value of 
In fi/Pki since this is the largest value of 7 for which the random 
variable 7 2 has nonzero probability. Therefore the function TS{d L ) is 
bounded in [7 mIn , 7„] and [7 6 , 7 mnx ] with the result that the last two 
integrals in equation 25 are exponentially small in n. The first in- 
tegral in this equation has the desired form, involving the central 
moments of 7 2 : 

f" TS(d L ) dF 2 (I) = d L (l) + dl(l)E(I -I) + h d'l(J)E[(I - I) 2 ] 



d(w 
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In the above equation the second term is zero since we have specified 
that I is the expected value of I 2 , and the last term can be shown to 
be proportional to (l/n) a . This establishes the result in the next theorem. 

Theorem S: The conditional average transmission distortion, d(w), satis- 
fies 

d(w) ^ d L {\) + \ d'LO) var (I 2 ) + oi^j • (26) 

Compared with the last low order term, the variance of Io is propor- 
tional to 1/n. 

The simplicity in the form of the last result is due to the use of 
the Taylor series expansion which not only has allowed us to evaluate 
a difficult integral, but has provided a natural way of separating the 
important terms in the lower bound to distortion. 

6.4 The Evaluation oj d L (I) and d^'(I) 

We shall denote by s and t the parameter values consistent with 
I = I in equations 20 and 21. Since 

K L 

■y'(-i) = X) Upki in/,/p*, , 

Jfc-1 i=l 

which is seen equal to E(I 2 ) = I, we can conclude that t, = —1. We 
also note here for future use that 

7 (-l) = 0. 
The first of the two significant terms in equation 26 is immediate: 

d L (l) = m'(s ). 

Next, elementary differentiation of the parametric expressions in 
equations 20 and 21 provides 

dim -i 



and 



-ip — l_i 

So L 7 "(-i) s 2 y'(s )J 
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Finally, the variance of I 2 is seen from equation 12 to equal 
Var (I 2 ) =-Ic t Var (I kr ) 



= ~ EcJZp^ln/z/p*,) 8 - ftp*, ln/,/p*,) 

= 7"(-D- 
With the substitution of these terms in equation 26 we obtain the 
result in the next theorem. 

Theorem 4: The conditional average transmission distortion, d(w), satis- 
fies 

rf(w) ^ n'to - o^- f T ^TT - M + °&) < 27 > 

v / _ 2ns o \_s ol i (s ) J \n/ 

in which s is given by 

, w - s yw = i-^m0^ + o(i). m 

It remains to average this lower bound over the entire source space 
W n . 

VII. THE AVERAGE OVER THE SOURCE SPACE 

To average the lower bound in Theorem 4 over the source space W n 
we assume that channel input words of equal composition are used for 
all transmissions. It has been shown 8 that this assumption does not 
affect the asymptotic form of the lower bound to distortion. We first 
notice that the lower bound in Theorem 4 depends upon the source 
word w only through its composition q which enters in the form of 
n(s). Therefore, we can average d(w) over the set of all compositions 
for w rather than over all of TF". As all composition vectors for w are 
probability vectors, they are all located on an H — 1 dimensional 
hyperplane, termed the composition space Q", which is in the "first 
quadrant" of R" and intersects each axis q { at one. Not all points in 
Q" are possible word compositions for any particular n. For example, 
with H = 2 and n = 2 there are only three possible compositions. But 
as n increases, the points in Q H that are source word compositions be- 
come quite dense. 

The probability that any particular composition q occurs at the 
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source output is 

P(q) = tf (q) f[ VT (29) 

t'-l 

in which iV(q) is the number of distinct source sequences with the 
composition q and the product is the probability of each. The number 
N(q) is given by 

mo = -i~- 

n (md i 

i-i 

We now write the total average source distortion, <Z(S), as 
d(S) = £^ d(q)P(q) 

compos i tions 

which we can lower bound by substituting for d(q) the lower bound 
found in Theorem 4. Rather than write out the entire expression each 
time we want to use it, we let d £ (q) denote the right side of equation 27, 
thus have 

d(S) ^ Z d L (q)P(q). (30) 

al 1 source 
compos i tions 

Viewed as a function over Q", P(q) is a set of impulses. This allows 
us to consider the distortion function d L (q) a continuous function over 
all Q", rather than a function denned only at composition points, and 
to write 

d(S) Z f ■ ■ ■ J d L (q)P(q) dq. (31) 

o" 

Again because the expression for d L (q) in equations 27 and 28 is para- 
metric, we use a Taylor series expansion of this distortion function to 
evaluate the integral. The point chosen for the expansion is p, the 
probability vector characterizing the source. The reason for this choice 
is that the components of this vector are the means of the coordinates 
of q when the latter are considered (dependent) random variables 
governed by P(q). The Taylor series then contains terms of the type 
(<Z. — Pi) > (Qi — Pi)(Qi — Pi) j and so on, which, when averaged by 
P(q), are the central moments of the components of q. 

Using the notation di,,(p) to indicate the partial derivative of d L (q) 
with the respect to q t evaluated at q = p (and similarly for higher 
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order derivatives), we have 
d(S) ^ / • • • / [«fe(p) + E d' Li (v\q< ~ Pi) 

Q" 

+ h E ^',,(p)(?, - P.)(g, - P.) 

+ \ E <^(«>)(?. - *>.)(<?, - P/)(<z* - p*) V(q) <** (32) 
«•»•* j 

with v> e Q" The central moments of the components of q can be found 
to be 

E(q t - Pi ) = 0, 

E[(Qi - P<)(qi - Vi)1 = I (P- 5,,. - p.p,) 
E[{q< - Pi)(qi ~ P,)(qk ~ Pk)] 

= Qj [pi 8 iik - PiPi hi ~ PiPk 5„ - PkPi Sy* + tPiPiPk], 
which, when substituted in equation 32, yields 

d(S) ^ d t (p) + ^ [E <*1'«(p)p< " E <tt'<,(p)p.P,-J + oQ). 

Referring to equation 27 we see that the required second derivative 
need only be taken of //(s ) as the two \/n coefficients allow other 
terms to be absorbed in those of o(l/n). The differentiation is lengthy, 
but straightforward, and yields 



(33) 



(34) 



and 



d ,u „\ _ M.(s-) 



>\2 a a 



where 

0, = pi(8 a ) - s m<(s o ). 
Upon substitution of these derivatives in equation 34 we obtain 

*» * ^ (p) " 2^7U) [ ? "« - 2 P*' M ' ] + °(n) 

= d L (p) r^-— Var (0) + o( -) • 
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With the final substitution of the expression for d L (l?) in equation 
27 we have the result in the next theorem. 

Theorem 5: The average transmission distortion of the source S, when 
used with the channel C, is lower bounded by 



V'(-i) + Ae) . 
L «V(«.,p) 



+ °(n) ^ 



d(S) ^ „'(*. , P) " 2^ 
in which s n is given by 

"<«• • p) - s -"'< s - - ») - * " i to ^fcrio + °fi) • (36) 

In this bound the vector g is, for the reasons previously stated, that 
which minimizes the bound, the vector f is chosen to maximize the 
bound in order to obtain the tightest bound, and the vector c is chosen 
to minimize the bound, that is to use the best composition for the 
channel input code words. As formidable as the derivations of these 
extremum appear, we show in the next section that the work involved in 
establishing the asymptotic behavior of the bound is actually quite 
simple. 

It should be mentioned that these results do not apply when 
y"(— 1) = 0, which is a situation that occurs when channel e is noise- 
less, for the reason that we have divided by and canceled factors equal 
to t"(— 1). The result for this case is derived separately in Section IX. 

VIII. THE ASYMPTOTE AND RATE OF APPROACH 

8.1 The Asymptote 

When n becomes large, the limiting form of the bound in Theorem 
5 is: 

d„(S) ^ n'(s„ , p) 
in which s satisfies 

n(s„ , p) - s„m'(s , p) = I 
with 

K L 

I = £ c * £ Vti In f t /p k , . 

The vectors g, f, and c must now be chosen to provide the extremum 
indicated just after Theorem 5. Since only f and c enter in the expression 
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for I, we can minimize d„(S) with respect to g for a constant I. This 
minimization provides precisely the expression 7 for the rate-distortion 
curve for S at the information rate I. It is further shown in the same 
reference that the value of g which provides the minimization is the 
vector that describes the output statistics on the test channel for S 
at the point (dy , I) on the rate-distortion curve. 

The maximization and minimization of d M (S) with f and c, respec- 
tively, can be accomplished by finding the same extremum of I. The 
resulting values for f and c are the output and input probabilities, 
respectively, on channel Q when it is being used to capacity and the 
value of I at the extremum point is — C. Therefore, the resulting ex- 
pression for the asymptote of the lower bound is 

d(8) ^ min /(«„ , p) = d c (37) 

g 

with s satisfying 

m(«.,P) - sy(s olV ) = -C. (38) 

This agrees with what we know to te the correct asymptote of the per- 
formance curve. 2 ' 7 

8.2 The Rate of Approach to the Asymptote 

Since the lower bound in equations 35 and 36 is parametric in s and 
includes the vectors f, c, and g, which when optimally chosen are func- 
tions of n, the complete asymptotic dependence of this lower bound upon 
the block length n is not obvious. To establish this dependence, we 
first find the full derivative of the lower bound in Theorem 5 with respect 
to n and then integrate the result between n and infinity. 

We first simplify the procedure slightly by using our freedom to 
choose f by setting this vector equal to its value at n = <*> ; f(°°). This 
does not change the end result. We also drop the terms of o(l/n) in 
equations 35 and 36, because they clearly do not affect the asymptotic 
result. Denoting the right side of equation 35 by d L and using the chain 
rule several times, we can write the desired derivative as 

dd L = (ddj\ , (ddj\ rfs v^ (ddj\ dfr 

dn \dn/ c , giS \ ds L, 6 , n dn i \dg t ) ahttl dn 



+ £ 



dd L \ dc k 
dc t I cm dn 
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with 

dn \an/ g ,c y \dgJ Bt *i dn T* \dcj elftt dn' 

c.n g.n 

The notations outside each parentheses indicate the variables which 
are momentarily held constant. Substitution yields: 



Mk = ( dd A 4. ( Mt ] (<>* 

dn \dn/ c .M.. \ ds / c . e n \dn 



+ E 



ds l c , s \dg i J Bk ^ i \dg 



k] 1 ^£i 



ds/c.*\dcj el + t \dc k I e ,* k \dn 



The bracketed terms represent the respective partial derivatives 
of d L with respect to #,- and c k with s removed from those quantities 
held constant. Since g(n) and c(n) are chosen for each value of n to 
minimize the lower bound d L , these partial derivatives must satisfy 



dd L 
dc k 



Bk-^i 



+ X = 1 ^ j ^ J (39) 



+ y = 1 ^ k ^ K. (40) 



ii?ii 



This presumes that, at least for sufficiently high n, both g and c have 
only nonzero components. This is known to be true for c, 14 which at 
71 = 00 equals the channel input probabilities that use the channel to 
capacity. 

The vector g, though, can at n = °o have a zero component. For this 
case, if the approach of gin) to g( co ) is from within the composition 
space, that is, if the components of g(n < °°) are nonzero, equation 
39 is correct as written for all finite n. If, however, the approach of 
g(w) to g(°°) is along the boundary of the composition space, that is, 
having one or more components equal to zero for all n > N, then 
equation 39 can be written, not for all 1 j£ j ^ J, but only for the «/' 
nonzero components. Over the region (N, °° ) the other J — J' zero com- 
ponents obviously can be treated as constants and not included in the 
differentiation process, thus excluded from the previous summations 
on ;. We shall not attempt to deal with the only remaining possibility, 
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which has g(n) approaching g(°°) such that it oscillates between vector 
values with all nonzero components and values with some zero com- 
ponents, since no example has been found exhibiting this behavior. 

We continue the derivation by substituting equations 39 and 40 
into the derivative of d h to obtain 

dn \dn / C . K .. + V ds /..,.. W... r dn V dn K * l) 

Finally, since both g and c are probability vectors, the last two sums 
are equal to zero (this is true even when the first sum is only over the 
«/' nonzero components of g). It remains only to find the required 
partial derivatives from equations 35 and 36. These are given by: 



§d±\ _L hl+£ 

dn A ~ 2n 2 s V sV ' 



- 1 



dd L 
ds 



o(D 



ds \ i ln y" 

\dnl \_ c In Sft s n 

whence substitution in equation 41 provides 



K ^r-, [(^7 - l) - In fr, + /n] + «fe) ■ (42) 



dck = 111 ! y" , \ ,. 7 
dn 



At this point, the vectors g, c and the parameter s are still functions 
of n chosen to satisfy the prescribed minimizations of Equation 55 
and the parametric Equation 35. If, for large n, these functions are 
written as 

g(n) = g(») + Ag(n) 

c(n) = c(oo) 4- Ac(n) 

s(n) = s(oo) 4- As(n), 

the delta terms can be extracted from the first term in Equation 42. 
Since each has limit zero for large n, they can, together with the (1/n) 
coefficient, be absorbed into the terms of o(l/n 2 ). Thus, in equation 42, 
we can use for g, c, and s their final values: g(°°), c(«>), and s(oo). 
Simple integration of equation 42 between n and infinity, and the 
use of the known final value of d L (n), d L (<x>) = d c , provides the final 
lower bound to distortion. We again point out that the derivation has 
included the approximation that g(z) factors as in equation 15. 
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Theorem 6: A lower bound to the minimum attainable transmission 
distortion in a system that includes the source S and the channel C is given by 



d(S) ^ Ac + ' 



7" , . 7 



77 - 1 - In -f-77 + "ST?? + ° f (43) 



H + •© 



2n|s|_\SM / 5|i s 

in which 

C = capacity of C 

d c = the distortion at R = C on the rate-distortion curve for S 
m(«) = £«< I* 2 0> expsrf,, 

q = p, the source output probabilities 

g = the output probabilities on the test channel for S at (d c , C) 
c, f = the input and output probabilities on Q when it is used to 
capacity 
t = -1 
s satisfies /x — Sfi' = —C. 

The lower bound in equation 43 is seen to approach its limit alge- 
braically as a/n. Since (w— 1) is at least as large as In w for any w 
and <r 2 and //' are variances, hence nonnegative, the coefficient a can- 
not be negative. But it can in special cases equal zero. The conditions 
for this are 

7" - sV 

a 2 = 0, 

conditions that are necessarily met when the source and channel are 
perfectly matched; that is, when d(S) = d c for all n. 

They do not, however, constitute a sufficient condition for matching 
since the low order correction terms in equation 43 could still be non- 
zero. For the more common situations wherein a is nonzero, the form 
of the lower bound suggests that the larger the value of a, the longer the 
coding block length must be to obtain a tolerable level of distortion, 
d c + A. In turn, the more complex the modulator and demodulator 
must become. These relations all suggest the utility of the coefficient 
a as a measure of mismatch between the source S and the channel Q; 
the larger the value of a, the slower the approach of the lower bound 
to its asymptote and the greater the mismatch between source and 
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channel. Section X gives several numerical examples illustrating dif- 
ferent types of mismatch. 

IX. THE SPECIAL CASE OF A NOISELESS CHANNEL 

As we have stated, Theorem 5 cannot be applied when e is noiseless 
because factors equal to ?"( — 1) have been canceled within its deriva- 
tion and, for a noiseless channel, y"(— 1) equals zero. We return to 
the lower bound in equation 3 which is still valid. If the vector f is 
chosen uniform over Y n , we see from the definition of a noiseless channel 
(L n outputs) and the definition of information difference in Section IV 
that 7(x, y) is equal to ln (1/L) for the output y t that has p(y t /x) = 1, 
and is infinite for all other outputs. Since /(y,) = L~ n , e~" I(k) is nonzero 
only in £a h £ IT", where it is equal to L n . Therefore, equation 3 can 
be written as 

rf(w) ^ L" / d(h) dh. (44) 

Jo 

We remember that the distribution function G(d) is the "inverse" 
function to d(h) and write 

rf(w) ^ L n f [L' n - G(d)] dd 

•>0 

which can be continued, with any d 2 < d (//*), by 
rf(w) ^ L n [' [IT* - G(d)] dd. 

Jo 

Upon dividing the region of integration into two parts, < di < d 2 , 
and using the monotonicity of G (d) , we have 

d(w) ^ d 2 - IT dMdi) - L n G(d) dd. (45) 

J d t 

A fui'ther lower bound results if we use an upper bound to G(d) in 
each of the last two terms. In particular, we use the asymptotic 
bound in equation 20 which we denote here by 

0(d) ^ H(n, s) exp n[n(s) - Sfx'(s)] (46) 

M '(«) = d. 

We now set d 2 equal to / (s„) with s given by 

H(n, 8.) exp n[n(s o )-s 0l i'(8.)] = L~ n = e"" c . (47) 
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The fact that G(d 2 ) < L~ n guarantees the inequality d 2 < d(Lr n ) 
which we have already used. The second term in equation 45 can be 
shown to be exponentially small in n whenever d\ < d 2 ; therefore, we 
also impose this inequality. To bound the last term in the same equa- 
tion we use the well known Chernov bound inequality: 

exp n[n(s)— sn'(s)] ^ exp n[/x(s„)— s a d] 

m'OO = d 
together with equations 46 and 47 to obtain 

L n [ ' G(d) dd ^ Be n '" t ' l " ) [' e' 9 " d dd 

with 

n H(n, s) 

u = max TT . — r« 
d lS d S d, H(n, s„) 

The resulting bound for d(vr), therefore, is 

d(w) ^ M '0O + — [1 - exp nsX»'(s n ) - d,)] + o(-V 

US \IX/ 

If di is chosen in a way to approach p'(s ) with increasing n, this 
bound becomes: 

d(w) ^ M 'W + ~ [1 + o(l)] (48) 

in which s satisfies equation 47, rewritten here as 



m(«.) - s /*'(0 = -C - - ln H(w, s„) 

= -C + ^lnn[l + o(l)]. 



(49) 



The remaining steps, averaging over the source space and minimizing 
the resulting bound over all choices of g (we continue to use the approxi- 
mation in Equation 15), are identical in procedure to those previously 
used. We state only the result. 

Theorem 7: The minimum attainable transmission distortion of the 
source S, when used with a noiseless channel of capacity C, satisfies 

d(§>) ^d c +\ t^- [1 + (1)] (50) 
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in which s a satisfies 

M(».,P) - «•/*'(*., p) = -C. (51) 

We see by comparing equations 43 and 50 that while the lower 
bound to distortion with a noisy channel approaches its asymptote, 
d 0} as 1/n, the lower bound to distortion with a noiseless channel ap- 
proaches d c only as (In n)/n. These bounds are not inconsistent 
since for a noiseless channel the variance y" is zero with the result 
that the coefficient of 1/n in equation 43 is infinite. A similar limiting 
statement is also true. If a noisy channel is made to approach a noise- 
less one by reducing the noisy transition probabilities toward zero, 
at the same time keeping the channel capacity constant by appro- 
priately reducing either the channel input alphabet size or the channel 
dimensionality, the coefficient of the 1/n term increases and is un- 
bounded. These results therefore suggest than when there is a choice 
between using a noiseless channel or a noisy one of equal capacity, 
the noisy channel is always the better choice. And, inasmuch as we 
are using the coefficient of the 1/n term to measure the source-chan- 
nel mismatch, the noiseless channel represents the worst possible 
match to any source. 

X. EXAMPLES 

In the first three examples, we illustrate different types of source- 
channel mismatch and calculate the effect of each upon the coefficient 
a in the lower bound of equation 43. Each of these examples tends to 
strengthen the suggestion in the lower bound result that this coef- 
ficient is a measure of source-channel mismatch since it increases 
monotonically as the channel is perturbed away from the matching 
channel. 

Because the channel statistics influence only the first two terms of 
a, we use in these examples a doubly uniform source for which the a 2 
term equals zero. To further isolate the relative matching properties 
of the source-channel pairs, we keep constant the channel capacity 
per source output, C, as the channel is varied. Thus the distortion 
per source component has the same asymptote, d c , for all source- 
channel pairs and the only difference in the lower bound curves, at 
least asymptotically, is in the coefficient a. 

Example 1 

This example illustrates a dimensionality, or coding block length, 
mismatch between a source and channel. We take for the source S 
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the ra.'th product of a binary symmetric source, denned by p = (\, \) 
and d n = d 22 = 0, d l2 = d 2i = 1. For the channel Q we take the m/th 
product of a binary symmetric channel, each component C< having a 
crossover probability p. The channel capacity per source component 
is mjm, times the capacity of 6,- and is kept constant as mjm, is 
varied by appropriately changing the crossover probabilities p. 

Figure 6 shows the dependence of a upon mjm, . When comparing 
the two curves in this figure, notice that the ordinate has been normalized 
by d c . We know that for m e /m. = 1 the source and channel are pre- 
cisely matched and this is indicated in the figure by the value a = 
at that point. Above this point a increases monotonically in mjm, and 
can be shown to have the asymptotic form a ~ k(mjm,)*. Below 
m e /m, = 1, a also becomes unbounded as m e /m, approaches the ratio 
that requires each component channel Q t be noiseless. This is not 
inconsistent with the noiseless channel result (equation 50) which 
indicated that the rate of approach of the distortion to d c was not as 
a/n but as (In n)/n. 

Example 2 

Here we do not change the relative dimensionality, only the form 
of the channel. The source is a binary symmetric source and the 
channel a binary nonsymmetric channel of varying asymmetry. The 
crossover probabilities are again changed in a way that does not vary 
the capacity. We see in Fig. 7 that a is rather insensitive to small 
perturbations from a binary symmetric channel and in most cases is 
affected less by this type of mismatch than a dimensionality mis- 
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Fig. 6 — The mismatch between a binary symmetric source and a binary 
symmetric channel of different dimensionality. 
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Fig. 7 — The mismatch between, a binary symmetric source and a binary 
nonsymmetric channel. 

match. A similar result obtains if the source is also allowed to be 
nonsymmetric. 

Example 3 

For this example we use a binary symmetric source and a discrete 
channel which models the m orthogonal signal modulator used in the 
next example. The channel has m inputs and m outputs and has 
from each input one transition of probability 1 — (m—l)p and m — 1 
transitions of probability p. The numbers m and p are varied to- 
gether in such a way that the capacity of the channel remains con- 
stant. We see in Fig. 8 that the mismatch coefficient a is much higher 
when the binary symmetric source is used with this channel than 
when it is used with that product binary symmetric channel of 
Example 1 which has available an input alphabet of equal size. The 
comparison can be made on Figures 6 and 8 at points for which 
m c /m 8 = log 2 m. 

Example 4 

In this, the last example, we include in the system a continuous 
channel which is to be used by a discrete source with a discrete modu- 
lator. Now, as the modulator changes the discrete channel extracted 
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from the actual channel changes and both its capacity and its match- 
ing characteristics change. It turns out that both properties are not 
necessarily optimized for the same modulator structure and, therefore, 
one must strike a compromise (influenced by the block length of 
interest) between a modulator design that minimizes the asymptote 
do and maximizes the rate of approach to d c . 

To illustrate this we assume the channel to be a band-limited chan- 
nel with additive white gaussian noise in the allowed bandwidth. 
During the interval (0,T), the discrete modulator is constrained to 
transmit one of m orthogonal signals in each of B bauds and alto- 
gether an energy no greater than E. To model the bandwidth con- 
straint the mB product is assumed constant, but m and B can other- 
wise be varied to optimize the system. Thus the equivalent discrete 
channel is the 23'th product of the m input doubly uniform channel 
of Example 3. The source to be transmitted is a binary symmetric 
source with an output rate of M B digits every T seconds. 

In Fig. 9 we show the minimum attainable distortion do (deter- 
mined through the channel capacity) and the mismatch coefficient 
a as a function of m. For the values shown in figure, we see that 
while do is minimized at m = 15, the coefficient a is then quite large. 
And, around m = 22, where a = 0, the minimum distortion d c is 
higher than that which can be realized with a smaller m. The con- 
clusion from this is that the modulator should be designed with m = 
15 (to maximize capacity and minimize dc) only when one is willing 
to use very long coding block lengths. For shorter block lengths, a 
larger value of m, and a corresponding smaller value of a, could result 
in a smaller average distortion even with the larger value of d c . For 



0.016 



0.012 



dca 



0.008 



0.004 





















^ 










dc 


= 0.1 




















0.01 












s 


^ 


L— — ' 














) 


J 




5 


i i 


1 


2 1 


4 1 


6 1 


8 2t 



Fig. 8 — The mismatch between a binary symmetric source and the m-orthog- 
onal signal channel. 
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Fig. 9 — The influence of the modulator design in Example 4 on the minimum 
attainable distortion and the mismatch coefficient. 

this example a compromise design with m about 19 would probably 
be best over a range of intermediate block lengths. 

It is interesting to notice in this example that the coefficient a can 
be zero even when the source and channel are not matched. This is 
consistent with our previous interpretation of a = as a necessary 
but not sufficient condition for matching. We remember that the 
coefficient a being zero does not imply that the lower bound in equation 
43 is precisely d c for all n. There are several other terms of o(l/n) 
in this equation that have not been specified which are not neces- 
sarily zero when a = 0. 

XI. THE UPPER BOUND 

Now let us present an upper bound to the minimum attainable 
transmission distortion as a function of the coding block length. As 
with the lower bound, the upper bound approaches the asymptote d c , 
but only as [(In n)/n]*. The reason for the difference, we believe, 
is that within the upper bound derivation the transmitting signal set 
was restricted to contain at most M = e n0 members, a restriction 
that was not necessary to impose in the lower bound. We also present 
an upper bound to the transmission distortion with a noiseless chan- 
nel. This bound does agree, asymptotically, with the corresponding 
lower bound. 



XII. THE RANDOM CODING ARGUMENT 



All of the upper bound derivations in this paper use random coding 
arguments. That is, we do not explicitly find the encoder and decoder 
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which, when used with S and e, provide the distortion in the upper 
bound, but show that one pair does exist. More specifically, we con- 
struct a set of encoder-decoder pairs with a probabilistic rule according 
to which each system is selected to be used. This defines an ensemble 
of transmission systems, each with its own distortion, corresponding 
to all possible coding selections. What we calculate is a bound to the 
average distortion of this ensemble. Clearly, this provides an upper 
bound to the minimum distortion in the ensemble, hence to the mini- 
mum attainable distortion in any system that includes S and 6. 

12.1 The Construction of the Ensemble 

We denote the set of points on the rate distortion curve for S by 
{d R , R) and assume the capacity of Q to be C. We first choose any point 
(d*, R*) on the rate-distortion curve below (d c , C) and design the 
code in such a way that the ensemble average distortion approaches 
d* with increasing block length. We know this to be possible from 
Shannon's results. 2 Moreover, we expect, since the situation is some- 
what analogous to a channel coding problem with R* < C, that the 
distortion can be made to approach d* exponentially fast. The point 
(d*, R*) is subsequently varied to obtain the best result at any particular 
block length of interest. 

For any selection of (d*, R*), we then choose the number of signal 
points, M = e nR , used to transmit S. To attain a transmission distortion 
level d*, we certainly must have the number of signal points large 
enough to represent the source to at least within d*, and this requires 
that R be greater than R*. We also require that R be less than C so 
that in the limit as n becomes large, we are guaranteed correct decoding 
among the signal points at the receiver. Therefore we have 

R* < R < C (52) 

and, for the corresponding values of distortion on the rate-distortion 
curve, 

(L tx ^ d* > d R > d c . (53) 

The value of R can also later be chosen to optimize the result. 

An ensemble of codes of length n is constructed for each selection of 
R and R*. We use the probability distribution p(x, z) to generate the 
ensemble by picking, according to p(x, z), M independent pairs (x, z) 
from X n Z n . Thus we have a set of codes containing all possible mappings 
of the integers 1 through M into pairs of w-letter words (x, z), or (JK) n!iI 
codes in total. (We continue to use here the notation defined in the 
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earlier part of the paper dealing with the lower bound.) Each of these 
codes has the associated probability 

M 

Pr (code)i= II p(x, , z,). 

Any probability function p(x, z) could be used to obtain an upper bound, 

but we use a distribution that factors into p(x)g(z); therefore, in the 

ensemble, each set of M decoded words, 0, , is independent of each 

set of M channel input words, 2 . Thus we can write 

u u 

Pr (code) = p(0, , 2 ) = p(0,)p(0 2 ) = II PC*0 II »(*.■) • 

• - i .=i 

Further, we use for p(x) and g(z) the product forms 
f[ p(x m ) and II dW) 

m=l m-1 

in which the letter probability distribution p(x) is that which yields 
a mutual information C on e and the letter probability distribution 
g(z) is that which gives the output statistics on the test channel for 
S at the point (d*, R*) on the rate-distortion curve. 

The encoding and decoding is done as follows: In every ensemble 
member there is a list 0, of allowed decoded words and a list 2 of usable 
channel input words. When a source output w occurs, the encoder scans 
0! and chooses any member z in this list for which 

d(w, z„) ^ d*. (54) 

If there are none, the encoder chooses any member at all on the list 
0! , say Zy . Since the lists are chosen together, there corresponds to 
z„ or z, a particular x in 2 , and this word is used to transmit w. The 
decoder uses a maximum likelihood decision rule to decode y into a 
member of 2 , which is then associated, through the pairings among the 
two lists, with a member z in 0, . The resulting distortion, by definition, 
is d(w, z). 

12.2 The Ensemble Average Distortion 

Each member, 0, of the ensemble is a complete transmission system 
in itself, and has an average transmission distortion dependent upon 
the codes, 0i and 2 , that are used. This average distortion, which is 
an average over all possible source and channel events, is equal to 

d(6) = did, , 2 ) = 2 P(w) E P(y I x) d(w, z). 
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The ensemble average distortion is obtained by averaging d{$\, 0->) 
over all choices of 6 X and 2 , hence 

(d(e)) av = Z p(w) E [Z Z p(y I x) d(w, z)p(» 1 )p(e,)]. (55) 

W> Y n e, 8, 

We next separate the events w, X , 2 , and y into two sets: (i) those 
quadruples for which either there does not exist a z in 6 X satisfying 
equation 54 or the received word y is decoded into a member of 2 
different from the transmitted word x(w), and (u) its complement. 
For quadruples in set one, the distortion d(w, z) is surely upper bounded 
by d maT , the maximum entry in || d(w, z) \\. For those in the second 
set, we use equation 54 and the fact that the decoder returns us through 
x(w) to z to upper bound the distortion by d*. Therefore, if the char- 
acteristic function $ is used to indicate the quadruples in set one, we 
can upper bound the ensemble average with 

W)) v ^Ip(w)EEE P(y I x) P (dMS2)[d*(i - *) + 4-.*] 

II'" }■" 0, 0, 

= d* + (d max - d*) Pr (#). (56) 

Finally, we use the union bound to upper bound Pr($) and the ensemble 
average distortion, (d(d)) av , to upper bound the minimum attainable 
transmission distortion, d(S), and obtain the result in the next theorem. 

Theorem 8: The minimum attainable transmission distortion of the 
source S, when used with the channel 6, satisfies 

d(S) ^ d* + (d mat -d*)[Pr( 3 'z. in $ t ) + Pr(channel error)] (57) 

in which 3 ' means "there does not exist," d* is any distortion greater 
than d c , and R (a variable in the bracketed terms) is any rate in the 
interval R* < R < C. The bound is a function of n through the quantity 
in the brackets. 

The last term in the brackets, the probability of error on the channel, 
has been approximated by many people, but we will use Gallager's 
bound 15 

Pr(e) ^ e~ nE<R) (58) 

in which E(R) is a positive monotonically increasing function of the 
difference C — R. The next section is devoted to the evaluation of the 
first term in the brackets, which is the probability that the source 
word w and the list 0, are such that equation 54 is not satisfied for 
any z in 0, . 
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XIII. THE PROBABILITY OF FAILURE AT THE ENCODER 

We say that failure occurs at the encoder, for the source output w, 
when each of the M allowed decoded words on list d^ are at a distortion 
d(w, z) from w greater than d*. Because each of the M words in d x is 
selected independently, we can write the total probability of this failure 

as 

Pr ( 3 'z. in 00 = 2 P(w) Pr ( 3 'z in 0, | w) 

(59) 

= E P(w)[l - Pr (z 3 d( w , z) ^ d* | w)] Af . 
if" 

The last probability is seen equal to the distribution function of the 
distortion random variable described in Section 6.2 and defined by 
equations 16 and 17. In these equations q = 91,92, ■ • ■ , Qh is the 
composition vector of the source word w, and D ir is the letter distortion 
random variable between the r'th appearance of the letter -u\ in w and 
the corresponding letter in z. 

We again notice that the distribution function of d(w, z) depends 
only upon the composition q of w. Thus we are able to perform the 
average over W* in equation 59 as one over all possible compositions 
of w. All possible compositions can be represented as points in the H — 1 
dimensional hyperplane within the first quadrant of R H which intersects 
each axis g, at one. This hyperplane is called the composition space 
Q a . The probability of any composition point is equal to the product 
of the number of different source words having this composition and 
the probability of each, therefore, we have 

P(q) =#(q) fipr 

1=1 



n (nq t ) ! ,=i 

1=1 

Interpreting P(q) as an impulse function over Q a we can now write 
equation 59 as 

Pr ( 3 % in <?0 = / • • • / P(q)[l - G(d* | q)]* dq. (60) 

Q" 

To continue the inequality in equation 57, we require a lower 
bound to G(d*). For our present purpose, Fano's lower bound 12 is 
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sufficient: 

G(d* | q) ^ K(n, q) exp n[fi(s, q) - arffa q)] 

= K{n, q) exp — nR(d*, q) 
in which 



(61) 



M '(s, q) = d* (62) 

< d* ^ E(d | q) (63) 

m(s) = 52 9. In 2 0/ exp sd,- f - 
t-i f-i 

and K(n, q) is a rather complex function of q and n that goes to zero 
algebraically in n with increasing n. Its precise form is otherwise un- 
important in the following derivation. (The bound in equation 61 can 
still be used for points q that violate equation 63 if one uses the value 
of s — rather than that which satisfies equation 62.) We can therefore 
write 

Pr ( 3 'z« in d t ) £ J • • • J P(q)[l - K(n, q) exp - nR(d* , q)] exp nR dq. 

Q" 

(64) 

The next step is to divide the composition space Q H into two dis- 
joint subspaces, Q and Q', that are defined by 

Q = {q: R(d*, q) < R - 8} (65) 

Q' = {q: R(d*, q) £ B - 8) (66) 

with 8 any positive number satisfying R* < R — 8. The idea behind 
this separation is illustrated in Fig. 10. The bracketed term in the 
integrand of equation 64 has the form [1 — exp (— nA)] cxp nB which 
approaches zero with increasing n when A < B, and one when A > B. 
In the first region, which, except for the 8, corresponds to the set Q, 
we shall use the upper bound 

[1 - exp (- nA)] ex ° nB ^ exp [- expn(fi - A)] (67) 

and in the second region, corresponding to Q', the (poorer) bound 

[1 - exp(-nA)]" pnfi ^ 1. (68) 
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Fig. 10 — The division of the composition plane Q a into the sets Q and Q'. 

The use of these bounds in equation 64 results in 
Pr ( 3 'z„ in B v ) 

^ J ' ' ' / F(q) exp { ~ K( - n ' q) exp n[R " /2(d *' q)] * dq 

Q 

+ /•••/ P(q)(D ^q 

<r 

sf'"J P(q) exp [-K(n, q) e n4 ] rfq + Pr (Q') 

o 
^ exp [-K(n)e ni ] + Pr (Q') (69) 

in which K(n) denotes the minimum of K(n, q) over Q. The first term 
in this upper bound is a double exponential in n which will turn out 
to be unimportant. Thus it remains to evaluate Pr (Q'). 

We shall use what we call the hypercube method to upperbound 
Pr(Q'). Although the resulting bound is not as tight as others that 
could be derived (see, for example, the maximum probability point 
method in Ref. 8), it has the advantage of being simpler both to derive 
and to use and, in addition, does not seriously degrade the final bound 
to transmission distortion. What is done is to enclose the set Q' by 
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another set Q[ that has a relatively simple configuration, and to upper 
bound Pr(Q') by Pr(Q[). 

We construct in R" a hypercube of dimension 2u centered at q = p, 

K" = |q:p, - u ^ q< ^ Pi + u\, 

and intersect with it the composition space Q H . The intersection forms 
a "solid" Q x 

Q, = Q" r\ K" 

which contains vertices of the form q„ = q u , q 2 . , • ■ ■ g», , with the 
components, of course, summing to one. When H is even, 5,, equals 
either p, + u or p, — u, and when H is odd, q it has the same values 
with the addition of one component equal to p< . The vertices of Qi 
are joined by straight lines. 

At this point we use the fact that Q is a convex set, 8 that is, for 
^ X ^ 1, Xq + (1 — X)q 6 is a member of Q whenever both q a and 
q 6 are. This property ensures us that whenever the vertices of Qt are 
in the set Q, the entire set Q, is in Q, 

Q, QQ, 

with the consequence that 

Pr(Q') ^ Pt(Q{). (70) 

The remaining step is to bound the total probability of the set Q[ . 
Because this probability equals the probability that any of the dependent 
events q { e' [p, — u, p, + u] occurs, we can use the union bound to 
upper bound Pr{Q\) by the sum of the individual probabilities. Thus 

H 

Pr (00 ^ Z) Pr [Qi < Pi ~ w] + Pr [g« > p. + u]. 

1=1 

These quantities can be further upper bounded by a simple applica- 
tion of Chernov bounds. This has been done for us in Ref. 16, page 
102, where the result found is, in our notation, 

Pr (00 ^ t e~ nXi + e~ nYi (71) 

in which 



v 

LW Vi - d 



n 
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and 

d t = Pi — u for Xi 

= Pi -f- u for Y t . 

In these bounds, the hypercube dimension 2w should be maximized, 
to obtain the tightest bound, subject only to the constraint that all 
vertices q, be in region Q, that is, that they satisfy equation 65. 

The bound in equation 71 can be simplified still further by writing 

Pr(Q0 ^ 2# exp \-n min (X, , 7,)] 

mKx exp - nE.(R). (72) 

Indeed, it can be shown, 8 that there are two, and not 2H, candidates 
for the minimizing quantity in the exponent. 

XIV. THE SET OF UPPER BOUNDS 

Combining equations 57, 58, 69, and 72, we have the following result: 

Theorem 9: The minimum attainable transmission distortion of the 
source S, when used with the channel C, satisfies 

d(§) ^ d* + (d rouI -d*){exp [-K(n)e ni ] 

+ Kt exp [-nE.(R)] + exp [~nE(R)]\ (73) 

for any d* and R that satisfy 

d max ^d* > d R > d c (74) 

R* < R <C. (75) 

The freedom provided by equations 74 and 75 can be used to generate 
a set of upper bounds, corresponding to all possible choices of d* and 
R, the properties of which depend upon those of the two exponential 
functions in equation 73. It has been shown elsewhere 8 that E,(R) 
is a positive monotone increasing function of the difference R — R*, 
that E.(R*) = E',(R*) = 0, and that E'/(R*) ^ 0. Comparing these 
with the corresponding properties of the channel reliability function: 15 
E(R) a positive monotone increasing function of the difference C — R, 
E(C) = E'(C) = 0, E"{(J) j* 0; we see that the two functions are quite 
similar. Typically, their curves would look like those in Fig. 11. 

With these curves, we can examine the behavior of the set of 
bounds in Theorem 9. As shown in Fig. 12, when d* is chosen much 
larger than d c , the nonzero slope of the rate-distortion curve allows 
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Fig. 11 — Typical behavior of E.(R) and E(R) near their zero value. 

a choice of R that can make both the differences C — R and R — R* 
large. In turn, the exponents E S (R) and E{R) in equation 73 are large 
and the exponential terms decay very rapidly with n. But for this 
choice, the asymptote d* is much greater than the level d c , which we 
know can be approached. 

On the other hand, if we choose d* only slightly greater than d c , 
we have an upper bound with an asymptote that is nearly d c , but 
now the differences C — R and R — R*, and therefore the exponents 
E 8 (R) and E(R), are much smaller and the rate of approach to the 
asymptote d* is correspondingly slower. Thus, in the selection of 
d* and R there is a trade-off between a small asymptotic value and 
a fast rate of approach. This is illustrated in Fig. 13 in which we 
show a set of curves obtained from the upper bound expressions in 
equation 73. The best compromise for any value of n is given by the 




Fig. 12 — The rate-distortion curve for S illustrating the relations among the 
parameters in Theorem 9. 
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Fig. 13 — The upper bound in Theorem 9 with three different values for d* 
and R. 

lower envelope to the entire set of bounds in equation 73, therefore 
we have 

Theorem 10: The minimum attainable transmission distortion of the 
source S, when used with the channel Q, satisfies 



d(S) ^ min d v (n, d*,R) = d a (&) 



(76) 



in which the function d v {n, d*, R) is used to denote the right side of 
equation 73. 

In the next section we study the asymptotic behavior of the lower 
envelope. At this point, though, we wish to include an important 
conclusion that can be established from the set of upper bounds 
in equation 73. Each individual bound indicates that, in a system 
where the distortion level d c is attainable in the limit, if one would 
tolerate a distortion d* = d c + A, this level could be approached ex- 
ponentially fast as the coding block length is increased. 

Actually, a much stronger statement is possible. Since the distor- 
tion curve for d* = d c + iA approaches this level in the limit, it 
must cross, at some finite n, the level d c + A. Because both curves are 
for the same source and channel, this proves that the distortion level 
d c + A is not only approachable exponentially fast, it is in fact at- 
tainable with a finite coding block length. This is true for any A > 0, 
no matter how small. 
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XV. THE ASYMPTOTIC BEHAVIOR OF THE UPPER BOUND 

From the previous discussion it is clear that as n increases, the 
optimum value of d* must approach d c and therefore that the ex- 
ponents E 8 (R) and E(R) must approach zero. For this reason we 
use the Taylor series representations for these functions at R* and 
C in equations 73 and 76, respectively, and obtain 

d v (S) « min [d* + (d m „ - d*) 

d'.R 

■[K, exp - nb,(R - R*) 2 + exp - nb 2 (C - R) 2 ]} (77) 

with 6, - \E','(R*) and b 2 = \E"(C). In using the Taylor series for 
E(R) and E,(R) we have dropped the cubic terms since both E'"(C) 
and E'/'(R*) are finite and C - R and R - R* are o(l). The double 
exponential term involving 5 is also dropped since it can be shown to 
contribute nothing important in the asymptotic bound. 

We next avoid the minimization on R by choosing that value of R 
which equates the two exponents: 

6,(72 - R*) 2 = b 2 (C - R) 2 . (78) 

While this selection of R is nonoptimum for finite n, it can be shown 
that it asymptotically approaches R opt , and that it does not affect 
the asymptotic behavior of the upper bound. This particular choice 
of R allows us to combine the two exponential terms in equation 77. 
If we start with equation 78 and the obvious equality 

(C - R) + (R - R*) = C - R*, 

we can establish 

V&i + Vb 2 

which further allows us to write the two exponents in terms of the 
common difference C—R*. 

Next, we wish to express the difference C—R* in terms of the 
difference d c —d*. Taylor's formula with remainder is again used: 

R(d*) = R(d c ) + R'(d c )(d* - d c ) + o{d* - d c ) 
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or 

C - R* = -R'(d c )(d* - d c ) - o(d* - da) (81) 

= -s (d* ~ d c ) - o(d* - d c ). 
In the last equation we have used the fact that the slope of the rate 
distortion curve at the point (d , C) is equal to the value of s which 
satisfies /n(s) — Sf/(s) = — C. 7 - 8 

Finally, we substitute equations 79, 80, and 81 into equation 77, 
subtract d c from both sides of this last equation, and change the 
minimizing variable to d*—d c to obtain 

d(S) - d c ^ min [x + (A - x)K 2 exp - Bnx 2 ] (82) 

x 

in which x = d* - d r , A = d max - d c , K 2 = K x + 1 = 2H + 1, and 

B = & 1 6 2 s!/(\/^+ VS) 2 . 
We next find the asymptotic behavior of the lower envelope in equa- 
tion 82. 

If a; is considered the parameter, each function of n in the set 
/(.t, n) starts at f(x, 0) = x + {A — x)K 2 and decreases exponentially 
to f(x, oo) = x. For any two parameter values, X\ and x 2 , with x x > 
x-> we have 

/(re, , 0) - j(x 2 , 0) = (1 - K 2 )(x, - x 2 ) 

= -2H{f(x 1} oo) -i(x 2) oo)]. 

Consequently, any two curves must cross as in Fig. 14. 

It follows that the parameter x (n), which identifies the minimum 
of f(x, n ) at the value n = n , must change with n. Since this param- 
eter is the solution of 

K(x, n) = 0, 
we have 

exp (nBx 2 ) - K 2 = 2nK 2 Bx (A - x ). (83) 

Figure 15 shows the required graphical solution which clearly always 
exists. The substitution of x (n) in f(x, n) specifies the single func- 
tion of n, f[x {n), n], which is the desired lower envelope. Un- 
fortunately, an explicit solution is not possible for x (n), nor for 
f[x (n), n], but we can obtain bounds to both that are adequate for 
our purposes. 



TRANSMISSION DISTORTION 



875 



X 2 +(A-X 2 )K 2 
Xi + (A-x,)K 2 




Fig. 14 — Two members of the family of curves: }(,x,n) = x + (A — x)K t 
exp(— Bnx 2 ). 



From the graphical solution in Fig. 15, we see that any conjec- 
tured solution, .t ?, must be too large if, in equation 83, the left side 
exceeds the right and too small if the reverse is true. This criterion 
could also be used on a trial functional solution x {n)1. Now, if the 
left side of equation 83 is functionally stronger in n than the right, 
we know that our trial solution x {ri)1 is too strong in n. Again the 
reverse is also true. 

After several guesses we are led to the trial functional solution 
x (n) = [a (In n)/Bn]* with which the right side of equation 83 is 
greater than the left for a < y*, and the reverse is true for a > %. 





2nK 2 Bx(A-x) /e nBl -K 2 




I / ! V - X 


I-K 2 


^^^ x (n) \ 



Fig. 15 — The graphical solution of equation 83. 
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This determines the highest order term of x (n) and we can write 

©•fe) d + "(Di ^ *» s a + ^'feyp + -era- 

It follows that 

and, since the lower envelope is smaller than any individual /(re, n), 
that 

fc».-fl*{a) l ftf ! ) , .-]-M , Pr) l tt + ««i- (84) 

Although only an upper bound to f(x, n) is required, both upper 
and lower bounds were found to show that the method used to obtain 
the desired lower envelope provides asymptotically tight results. Con- 
tinuing the inequality in equation 82 by that in equation 84 provides 
our final upper bound to transmission distortion. 

Theorem 11: The minimum attainable transmission distortion of the 
source S, when used with the channel Q, is upper bounded by 



in which 



d(8) ^ d c + b(^~) [1 + o(l)] (85) 

b= \2B/ = WW\ U&O* + (SFJ 



6, = \E'.'{R* = C) 

b 2 = \E"{C). 

For a fixed source S, we see from this theorem that the coefficient 
b is smallest when S is used with that channel (among those of equal 
capacity) for which the constant b 2 is largest. In the same way, the 
coefficient b is seen to be a decreasing function of b v when the channel 
is fixed. Since the constant b 2 is independent of the source and b x in- 
dependent of the channel, our upper bound does not provide an in- 
dicator of matching between the source and channel as we obtained in 
the lower bound. This was actually expected since here we were forced 
to separate the source and channel with an interface containing at 
most e nC points. 

The coefficient &i , though, has an interesting significance. It is 
equal to one-half the derivative E'/(R* = C) which can be thought to 
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indicate how fast the boundary of Q' initially moves away from p with 
increasing R. In turn, this indicates, in a reciprocal maimer, the neces- 
sary rate of change of the rate required to handle source words with 
compositions just around p, which are just less than typical. Thus, we 
can think of the coefficient b x as a type of "stretch factor" 10 for the 
source. 

When the result in equation 85 is compared with the lower bound to 
distortion, we see that the [(In n)/n^ rate of approach to d c is slower 
than the 1/n rate of approach of the lower bound. Mathematically, 
at least, the reason for the upper bound decreasing more slowly than 
(1/n)* is that, for small arguments, the lowest order term in the two 
exponents E(R) and E.(R) is quadratic. Their form for large n, exp 
— n(AR) 2 , shows that values of AR larger than (1/n)* are required to 
have these terms go to zero with increasing n. Because the slope of 
the rate-distortion curve is nonzero, the corresponding values of dis- 
tortion difference (Ad) must also be larger than (1/n)*. 

There is reason to think that this type of exponential term, and the 
consequential [(In n)/n] § rate of approach to d c , is present in the upper 
bound because we have used threshold devices in the transmission 
system. One at the encoder leads to the first exponential term in equa- 
tion 73 (we again disregard the double exponential term). It uses the 
rule in equation 54 to choose, for each source word w, any decoder word 
z in list X at a distortion less than d*. When list B x is lacking such an 
entry, any z at all on the list is chosen which, since the members of 
0, are chosen independently, is then independent of w. The resulting 
distortion in this circumstance is usually much greater than d*. In the 
next section we compare the performance of this encoder with another 
that does not use such a threshold and show that the source encoding 
alone need only contribute to a rate of approach to d c equal to (In n)/n. 

A second threshold operation in our system is at the channel decoder, 
but it is really dependent upon the coding of the entire system. It leads 
to the second exponential term in equation 73. To isolate its effect on 
the system performance, we assume that failure has not occurred at the 
encoder, that is, there does exist a z on X with d(w, z) ^ d*. Now if 
the channel decoder makes no error, we are assured that the resulting 
distortion is less than d*. However, if an error is made, the believed 
channel input word x, is different from the actual word x; therefore the 
decoded word z x is different from z D . Moreover, since the lists t and 
2 are chosen independently, z„ and z x are statistically independent. 
It follows that z, and w are also statistically independent, and in con- 
sequence that the distortion d(w, z,) is usually much greater than d*. 
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It is this threshold which, it is believed, cannot be eliminated when 
the signal space is constrained to contain at most M = e nC points, even 
if the lists 0, and a are chosen depcndently. A heuristic argument in 
Ref. 8 suggests that with such a constrained signal set, the transmission 
distortion can approach d c no more rapidly than as n'K This, of course, 
is a slower rate of approach to d c than the a/n rate of approach of the 
corresponding lower bound to distortion that was derived using a 
signal set not constrained in size. 

XVI. AN IMPROVED UPPER BOUND FOR NOISELESS CHANNELS 

For the special case of a noiseless channel, the previously derived 
upper bound can be improved. Since such a channel contains e° noise- 
less transitions, or "direct" paths, transmission of the encoder output 
is trivial and the communication problem is only one of source 
representation. For this representation we are allowed to choose, from 
an e° letter representation alphabet, one representation letter for 
every source output letter. Just as one is allowed n uses of the channel 
to transmit an n-letter source output, one is allowed an n-letter 
representation word to approximate an n-letter source word. 

We first state that if the threshold source encoder denned by equa- 
tion 54 is used in the ensemble of representation codes 6 X of Section 
XII, the ensemble average representation error is very similar to the 
ensemble average transmission error derived in the previous sections. 
The only difference in the derivation is that the Pr (channel error) 
term is no longer present in equation 57, nor in any succeeding equa- 
tion, with the only result being that 6 2 = °° in equation 85. 

We note here that this particular result is valid only for sources 
that are not doubly-uniform, that is, having a uniform probability 
distribution and a distortion matrix in which all rows are permuta- 
tions of one row vector and all columns are permutations of one col- 
umn vector. The reason for this exclusion is that for doubly-uniform 
sources the exponential term in equation 73 involving E g {R) also 
vanishes, and the double exponential term involving 8, previously 
dropped as insignificant, now remains as the only term. It is instruc- 
tive to delay further evaluation of the bound in this case until after 
the following upper bound to representation distortion is derived. 

16.1 Optimum Source Encoder 

We now derive an upper bound to the source representation error 
when an optimum source encoder is used in place of the threshold 
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encoder of the previous section. The resulting upper bound will be 
seen to approach the asymptote, d c , as (In n)/n. This represents an 
improvement upon the best previously known upper bound to source 
representation distortion 7 which approached do essentially as n~ % . 

The coding ensemble used here is very similar to the set of codes, 
0i, used in Section XII. But now the size of the set, M, is set equal to 
e nC for all n, rather than have it approach this size with increasing 
to. And, the probability with which each ensemble member is used, 

Pr(code) =p(e 1 ) = II<7(z.-), 

is now governed by that probability distribution g(z) equal to the output 
probability distribution of the test channel at the point (d c , C) on 
the rate distortion curve for S. Within each ensemble member the 
encoder chooses, for any occurring source word w, that member z 
on 0i for which d(w, z) is minimum. Therefore, for each ensemble 
member the average distortion over all possible source events is 

die,) = £p(w)[min d(w, z,)]. (86) 

W " ISiSM 

The ensemble average distortion is given by 

<<*(*,»., = Ep(w) Zp(*.)[min <*(w, z,)]. (87) 

IF" 0, ISiSM 

Zicdi 

The set of quantities d(w, z.) in equation 87 could be thought of as 
a set of M independent and identically distributed random variables, 
each conditioned on w and governed by the word probability distribu- 
tion <7(z). The minimum of this set, d min (w), is then also a random 
variable, governed by the code probability distribution p(0i). The inner 
sum in equation 87 is, therefore, the expected value of d m in(w) and 
we can write 

W0)av= Zp(w) f*" ddF dmlDl „(d\w) 

IF" •'0 

which, upon integration by parts, becomes 

<<*(»i)>.v = Z P(w) f " " [1 - F dmin|w (d | w)] dd. (88) 

IF" •'U 

The conditional distortion random variables d(w, z<) are the same dis- 
tortion variables used in Section XIII. Since they depend only upon 
the composition of w, we can again perform the summation in equation 
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88 by integration over the composition space, thus 

W,)>., = /•••/ P(q) dq £ " [1 - ftbi-ntf I q)] dd (89) 

= f ■■ f P(q) rfq(rf min (q)) n v • (90) 

The inner integrand in equation 89 is the probability that all M 
points on 0, have a distortion d(w, z) from w greater than d. Using the 
independence property of the members of 0, , we can write this proba- 
bility as 

1 - F dmin , ,(<j | q) - [1 - <?(<* | q)] Af - (91) 

It can be seen from equation 16 that the variance of the variable d is 
proportional to 1/n for every q. Therefore the function [1 - G(d | q)], 
which for every n decreases monotonically from one to zero, approaches, 
with increasing n, a negative step at the value of distortion d = E{d | q). 
The same is also true of [1 - G{d \ q)] M which approaches a negative 
step at some lower value of distortion, d c (q). This can be established 
using the following asymptotic upper and lower bounds to the dis- 
tribution function G(d | q) which are from Shannon 11 and Gallager": 

h(n, q) exp -nR(d, q) ^ G(d | q) ^ H(n, q) exp -nR(d, q) (92) 
with 

R(d, q) = m(s, q) - s M '(s, q) (93) 

< m'(s, q) = d^ E(d\ q) 

and in which h(n, q) and H(n, q) are algebraically small functions of n. 
Therefore, within the range < d ^ E(d | q), the function in equation 
91 can be bounded by 

[1 _ ff<f«*]«p "'' ^ [1 - G(d | q)]' w £ [1 - he-**]™ "''; 

(94) 

which proves that [1 - G(d | q)]" must approach one when R(d, q) > C 
and zero when R(d, q) < C. That the function R(d, q) is monotone 
decreasing in d within < d ^ E(d | q) now establishes the stated 
limiting step function form of [1 - G(d | q)]'" with d c (q) equal to the 
distortion value for which 

«[dc(q), ql = C. (95) 
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The region of integration in equation 89 is thus conveniently divided 
into two parts: one over [0, d c (q) + A] in which the integrand is upper- 
bounded by unity, and the other [d c (q) + A, d nittX ] in which the integrand 
is upper-bounded by its value at the lower limit. The result is 

<<L in (q)> B v ^ d c (q) + A + [d max - rfc(q) - A][l - G(d c (q) + A | q)]* 

(96) 

which, with the use of the lower bound in equation 92, can be continued 
by 

<<Li„(q)> v ^ dc(q) + A + [dL. - d c (q) - A] 

■{1 - h exp [-nfl(d c (q) + A, q)]} elp nC . 

Equation 67 allows the further continuation of this bound by: 

<dLi.(q)>« ^ d c (q) + A + K, ax - d c (q) - A] 

•exp (-h exp {n[C - i2(d c (q) + A, q)]}). (97) 

Again the monotone decreasing property of R(d, q) in d provides that 
the quantity C — R(d c (q) + A, q) is positive when A is positive and, 
therefore, that the last term in equation (97) is a decreasing double 
exponential in n. 

Equation 97 actually provides, for each q, a set of upper bounds to 
(dmin(q))av very similar to the family of curves studied in Section XV. 
In the choice of the parameter A there is once again a trade-off between 
a small asymptote, d c (q) + A, and a fast rate of approach. It should, 
in general, be chosen to optimize the bound at each n. Since we want an 
upper bound to (d min (q)) av that approaches d c (q) with increasing n, 
the optimizing parameter A„(n) clearly must approach zero as n in- 
creases. But A„(n) must approach zero in a way that also allows the 
last term of equation 97 to vanish. 

Since an asymptotic bound is our goal, we extract the essential be- 
havior of this term for small A by forming a Taylor series of R(d, q) 
at d = d c (q): 

C - R(d c (q) + A, q) = -AR'(d c (q), q) + o(A) 

= -sA + o(A). 

In this expression s is the parameter value in equation 93 when d equals 
d c .(q). Thus the lower envelope to the set of bounds in equation 97 
can be written, for the purpose of an asymptotic bound, as 

<d m ,„(q)).v ^ min (d c (q) + A + [d mai - d c (q) - A] exp (-.fef '"*)}. 

A 
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The minimization is found using the same method used in Section XV. 
In this process, it is important to notice that Shannon's coefficient 
h(n, q) in equation 92 is proportional to n _i . The result is that the 
optimizing parameter satisfies 

i tea [i + o(D] S1 .w4 + fe U + o(l)] 

and that (d min (q)) BT satisfies 

<d„,„(q)>„ S <J„(q) + (| + «) r^ [1 + 0(1)]. (98) 

Returning to equation 90, the ensemble average representation error 
therefore can be upper bounded by 



<<*(*,)>„ £ / • • • / P(q)[rf c (q) + (! + «) Z^] d *- 



(99) 



The above integral is evaluated in the same way similar averages 
were found for the lower bound. The bracketed quantity is expanded 
in a Taylor series about q = p and is truncated after three terms with 
a Lagrange remainder term. Upon integration of this expansion we find 

««,)>„ g dM + (| + .) ^ 

+ e4-[^) + (| + ^>,.- p .) 

+ S iifk [«« + (I + «) ^1 E[( «- - *** - p ' )] (100) 

with s = s(p) and <q e Q u . 

Using the following expected values in equation (100), 

E(q< - p t ) = 

E[{q t - pdiQi - Pi)] = - (p. &a ~ PiPO, 

we have the following upper bound to the ensemble average distortion 
and, therefore, to the minimum attainable representation error. 

Theorem 12: The minimum attainable transmission distortion (rep- 
resentation distortion) oj the source S, when used with a noiseless channel 
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of capacity C, is upper bounded by 

<*(«) ^ d c + (± + e) -^ [1 + 0(1)] (101) 

in ivhich s satisfies 

M (s , p) - sy(s„ , p) = -C. 

Except for the arbitrarily small positive e, the bound in equation 101 
agrees precisely with the asymptotic lower bound that we found earlier 
in this paper. 

We see by comparing equation 85 (with 6 2 = a> for the noiseless 
channel) and equation 101 that the replacement of the threshold source 
encoder with an optimum encoder increases the rate of approach to 
the asymptote from [(In n)/n$ to (In n)/n. To obtain some feeling 
for the reason for this improvement, we might think of the optimum 
encoder as a threshold encoder, but with a threshold that varies de- 
pending on the particular source output. Indeed, we used this step 
within the mathematics when we separated all events (equation 96) 
into two sets with the separation dependent upon the source word. In 
particular, for any source output word with composition q, we used 
a threshold, d c (q) + A, just large enough so that for large n there is 
almost surely a representation word in S that is acceptable. It does 
not require, as does the fixed threshold encoder, that the set of source 
words not meeting a fixed distortion level of d* have a total probability 
that goes to zero with n. This restriction is really more severe than one 
would think we need, since some of the source words w discarded by 
the fixed threshold encoder are just outside p, having characteristics 
just less than typical, for which some of the distortions d(w, z.) might 
be only marginally greater than any fixed d*. 

16.2 The Special Case of a Double Uniform Source 

There is one situation for which both source encoders provide a 
representation distortion that approaches the limit d c as (In n)/n. 
This is when the source S is doubly-uniform. Since n(s, q) is independent 
of q for such a source, R (d*, q) in equation 61 is also independent q, 
with the result that the set Q' in equation 66 is always empty. There- 
fore, Pr(Q') = in equation 69 and we have for the set of upper bounds 
to representation distortion, using threshold encoders: 

d(S) ^ d* + (d max - d*) exp (-he ni ). 
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In this bound we have used the lower bound in equation 92 rather 
than that in equation 61. It can now be shown, using precisely the 
same procedure as before, that this set of bounds approaches the 
limit d c as (In n)/n. 

XVII. SUMMARY 

We have presented upper and lower bounds to the minimum at- 
tainable transmission distortion of a source measured by a specified 
distortion measure. The bounds, which were derived for both noisy 
and noiseless channels, have all been shown to converge to the same 
level of distortion, d c , algebraically in the block length n. The quan- 
tity d c is that level of distortion shown by Shannon to be the mini- 
mum attainable transmission distortion when the channel capacity is 
C and arbitrarily complex transmission methods are allowed. 

For noisy channels, the rate of approach of the lower bound to d c 
is as a/n and that of the upper bound as b[(ln n/n)] % . The non- 
negative coefficients a and b are both functions of the statistics of the 
source and channel, but have different forms. The lower bound coef- 
ficient, a, interrelates these statistics in such a way as to suggest its 
utility as a measure of "mismatch" between the source and channel, 
the larger a, the slower the rate of approach of the bound to d c , and 
the larger the source-channel mismatch. This coefficient is, of course, 
necessarily equal to zero whenever the source and channel are per- 
fectly matched, that is, whenever the minimum attainable transmis- 
sion distortion is equal to d c for all block lengths, n. 

The coefficient b in the upper bound, though, does not present an 
indicator of source-channel mismatch. It is the sum of two terms 
which separately contain the source statistics and the channel sta- 
tistics. The cause of this separation is the interface between the 
source and channel that results from the use of a transmitting signal 
set constrained to contain at most e nC members, a constraint which 
we found necessaiy to introduce in the development of the bound. 

For noiseless channels, both the upper and lower bounds to the 
transmission distortion (or the source representation distortion) 
have the same form. They both have been shown to approach the 
asymptote d c as a t (In n)/n. 
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